Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

XNOR-Net: Imagenet Classiﬁcation Using Binary Convolutional Neural Networks

The XNOR-Net binarization approach seeks to identify the most accurate convolutional

approximations. Speciﬁcally, XNOR-Net employs a scaling factor, which plays a vital role

in the learning of BNNs, and improves the forward pass of BNNs as:

aⁿ

out ^≈^αⁿ^◦⁽^b^wⁿ^⊙^b^aⁿ

in),

(3.3)

where αⁿ= {αⁿ

1^{, α}ⁿ

2^{, ..., α}ⁿ

Cⁿ

out^{} ∈}^R^Cⁿ

out

is known as the channel-wise scaling factor vector

to mitigate the output gap between Eq. (3.1) and its approximation of Eq. (3.3). We denote

A = {αⁿ}^N

n=1^{. Since the weight values are binary, XNOR-Net can implement the convolu-}

tion with additions and subtractions. In the following, we state the XNOR operation for a

speciﬁc convolution layer, thus omitting the superscript n for simplicity. Most existing im-

plementations simply follow earlier studies [199, 159]to optimize A based on non-parametric

optimization as:

α^∗, b^w^∗= arg min

α,b^w

J(α, b^w),

(3.4)

J(α, b^w) = ∥w −αⁿ◦b^w∥²

2^.

(3.5)

By expanding Eq. 3.5, we have:

J(α, b^w) = α²(b^w)^Tb^w−2α ◦w^Tb^w+ w^Tw

(3.6)

where b^w∈B. Thus, (b^w)^Tb^w= Cin × K × K. w^Tw is also a constant due to w being a

known variable. Thus, Eq. 3.6 can be rewritten as:

J(α, b^w) = α²× Cin × K × K −2α ◦w^Tb^w+ constant.

(3.7)

The optimal solution can be achieved by maximizing the following constrained optimization:

b^w^∗= arg max

b^w

w^Tb^w, s.t.

b^w∈B,

(3.8)

which can be solved by the sign function:

b^wⁱ=

wi ≥0

−1

wi < 0

which is the optimal solution and is also widely used as a general solution to BNNs in the

following numerous works [159]. To ﬁnd the optimal value for the scaling factor α^∗, we take

the derivative of J(·) w.r.t. α and set it to zero as:

α^∗=

w^Tb^w

Cⁿ

in ^×^Kⁿ^×^Kⁿ^.

(3.9)

By replacing b^wwith the sign function, we have that a closed-form solution of α can be

derived via the channel-wise absolute mean (CAM) as:

αi =

∥wi,:,:,:∥1

Cin × K × K

(3.10)

αi = ^∥^w^i,^:^,^:^,^:^∥¹

. Therefore, the optimal estimation of a binary weight ﬁlter can be achieved

simply by taking the sign of weight values. The optimal scaling factor is the average of the

absolute weight values.

Based on the explicitly solved α^∗, the training objective of the XNOR-Net-like BNNs

is given in a bilevel form:

W^∗= arg min

L(W; A^∗),

s.t. arg min

αⁿ,b^wⁿ^J⁽^α^,^b^w⁾^,

(3.11)

which is also known as hard binarization [159]. In the following, we show some variants of

such a binarization function.